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The implementation of optimal statistical inference protocols for high-dimensional quantum sys¬ 
tems is often computationally expensive. To avoid the difficulties associated with optimal techniques, 
here I propose an alternative approach to quantum estimation and detection based on Volterra fil¬ 
ters. Volterra filters have a clear hierarchy of computational complexities and performances, depend 
only on finite-order correlation functions, and are applicable to systems with no simple Markovian 
model. These features make Volterra filters appealing alternatives to optimal nonlinear protocols for 
the inference and control of complex quantum systems. Applications of the first-order Volterra filter 
to continuous-time quantum filtering, the derivation of a Heisenberg-picture uncertainty relation, 
quantum state tomography, and qubit readout are discussed. 


I. INTRODUCTION 

The advance of quantum technologies relies on our 
ability to measure and control complex quantum sys¬ 
tems. An important task in quantum control is to in¬ 
fer unknown variables from the noisy measurements of 
a quantum system. Examples include the prediction of 
quantum dynamics for measurement-based feedback con¬ 
trol [1-5] and the estimation and detection of weak sig¬ 
nals [6-23]. To implement the signal processing for such 
tasks, a Bayesian decision-theoretic formulation of op¬ 
timal quantum statistical inference is now well estab¬ 
lished [1-7, 17-19]. The quantum filtering theory pio¬ 
neered by Belavkin [24, 25] for the optimal prediction of 
quantum dynamics has especially been hailed as a semi¬ 
nal achievement in quantum control theory; its applica¬ 
tions to measurement-based cooling [26], squeezing [27], 
state preparation [28], quantum error correction [29, 30], 
qubit readout [20-22], and quantum state tomography 
[11-14] in atomic, optical, optomechanical, condensed- 
matter, and superconducting-microwave-circuit systems 
[1] have been studied extensively in the literature. 

Although optimal quantum inference has been suc¬ 
cessful experimentally for low-dimensional systems, such 
as qubits [31] and few-photon systems [32], as well as 
near-Gaussian systems, such as optical phase estimation 
[23] and optomechanics [33] , its implementation for high¬ 
dimensional non-Gaussian quantum systems is beset with 
difficulties in practice. An exact implementation of the 
quantum Bayes rule [2] for optimal inference requires nu¬ 
merical updates of the posterior density matrix based on 
the measurement record. Except for special cases such 
as Gaussian systems [1], the number of elements needed 
to keep track of the density matrix scales exponentially 
with the degrees of freedom, making the implementation 
prohibitive for many-body non-Gaussian systems. This 
problem, known as the curse of dimensionality, means 
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that approximations must often be sought [26, 29, 30, 34- 
37]. Gurrent approximation techniques for dynamical 
systems include Gaussian approximations [13, 26, 34], 
phase-space particle filters [36], Hilbert-space truncation 
[30, 35], and manifold learning [37], but these techniques 
provide little assurance about their actual errors and of¬ 
ten remain too expensive to compute for real-time con¬ 
trol of high-dimensional systems. Another problem with 
optimal inference and the associated stochastic-master- 
equation approach is its reliance on a Markovian model, 
which is difficult to use for many complex systems, espe¬ 
cially those with 1// or fractional noise statistics. With 
the ongoing trend of increasing complexity in quantum 
experiments, not only with condensed matter but also 
with optomechanics [38], atomic ensembles [39], and su¬ 
perconducting circuits [40] , optimal inference is becoming 
an unattainable goal in practice. 


Against this backdrop, here I propose an alternative 
approach to quantum estimation and detection based on 
Volterra filters. Instead of seeking absolute optimality, 
Volterra filters are a class of polynomial estimators with 
a clear hierarchy of computational complexities and esti¬ 
mation errors [41]. Their applications to quantum esti¬ 
mation and detection promise to solve many of the practi¬ 
cal problems associated with optimal quantum inference, 
including the curse of dimensionality, the lack of error as¬ 
surances upon approximations, and the need for a Marko¬ 
vian model. The filter errors also provide a set of up¬ 
per error bounds on the Bayesian quantum Cramer-Rao 
[6, 7, 42], Ziv-Zakai [43], and Helstrom [6, 7, 44] bounds, 
forming novel hierarchies of fundamental uncertainty re¬ 
lations and may be of independent foundational interest. 
The Volterra series has recently been used to model the 
input-output relations of a quantum system [45] , but my 
focus here is different and concerns the estimation of hid¬ 
den observables and hypothesis testing given the output 
measurement record. 
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II. QUANTUM ESTIMATION 
A. Formalism 

Consider a quantum system in the Heisenberg picture 
with initial density operator p. Let 


y = 


( 2 /( 1 ) \ 
2 /( 2 ) 


V viK) / 


( 2 . 1 ) 


be a column vector of observables under measurement. 
For example, y can be the observables of an output 
optical field under homodyne, heterodyne, or photon¬ 
counting measurements. Given a measurement record 
of y, the goal of quantum estimation is to infer a column 
vector of hidden observables 


X = 


( a;(l) \ 

a;(2) 

V *('^)) 


( 2 . 2 ) 


For example, x can be the observables of a quantum sys¬ 
tem that has interacted with the optical field, such as the 
position of a quantum mechanical oscillator or a spin op¬ 
erator of an atomic ensemble, and the goal of the estima¬ 
tion is to infer x given the measurement record. Quantum 
estimation is usually framed in the Schrodinger picture 
via the concept of posterior density operator [1, 2], but it 
can be shown to be equivalent to the Heisenberg-picture 
approach adopted here [4] . This task is especially impor¬ 
tant for measurement-based feedback control [1], such as 
measurement-based cooling and squeezing, to gain real¬ 
time information about quantum degrees of freedom and 
to reduce their uncertainties via feedback control. Ex¬ 
periments that implement quantum estimation have been 
reported in Refs. [31-33] for example. 

The estimation error has a well-defined decision- 
theoretic meaning if all the x and y operators commute 
with one another, such that x and y can be jointly mea¬ 
sured and treated as classical random variables in the 
same probability space [4, 7, 46]. This assumption is ap¬ 
plicable to a wide range of scenarios, including quantum 
filtering [4, 46] and the estimation of any classical param¬ 
eter or waveform coupled to a quantum system [17, 47]. 

Since x and y are compatible observables, the rest of 
the estimation theory is identical to the classical treat¬ 
ment [41], Let x{j\y) be an estimator of x{j) given y, 
and assume that the estimator is given by the truncated 
Volterra series, viz., 

p 

^(j|2/)=X! X! ^p(j,^i,fc2,---,fcp|6') 

p^O l<ki<k2<---<kp<K 

X y{ki)y{k 2 ).. .y{kp), (2.3) 


where 0 is a vector of tunable parameters, P is the order 
of the series and quantifies the complexity of the filter, 
and the zeroth-order term is simply a constant ho(j) and 
does not depend on y. For P —^ oo, the series can be 
regarded as the Taylor series for an arbitrary estimator, 
although I will focus on finite P. 

A useful trick to simplify the notations is to define the 
set of all products of y elements up to order P as 


= {l,y,y' 


®2 




(2.4) 


where 


y’^P = {yiki)y{k2)...y{kp); 

^ ^ < k2 < ■ ■ ■ < kp < K} (2.5) 

is the set of all pth-order products of y elements. Then 
the Volterra series in Eq. (2.3) can be rewritten as 




( 2 . 6 ) 


where is a linear filter with respect to y^^^ but equiv¬ 
alent to the Volterra hlter that is nonlinear with respect 
to y, and ^ is a composite index that goes through all 
elements in y^^\ 

Define 


{fix,y)) = tT[pf{x,y)] 


(2.7) 


as the expectation of any function of x and y, with tr 
denoting the operator trace. Let the error covariance 
matrix be 


E(j, k) = {[x{j) - x{j\y)\ [x(k) - x{k\y)]) . (2.8) 

The absolutely minimum mean-square error for arbitrary 
estimators is achieved by the conditional expectation of 
X given y [4]. For the optimal filtering and prediction of 
quantum observables for example, the usual method is to 
compute the posterior density operator p{y) conditioned 
on the measurement record y in the Schrodinger picture 
using the Kraus operators that characterize the measure¬ 
ments [1, 2], and then take the conditional expectation 
given by i(jjy) = tr[xsij)p{y)], with xsij) being the 
Schrodinger picture of x{j). If the continuous-time limit 
is taken, the posterior density operator obeys the cele¬ 
brated stochastic master equation [1-4] first proposed by 
Belavkin [24, 25]. The computation of p{y) suffers from 
the curse of dimensionality however. To restrict the com¬ 
plexity, consider here instead the error of the Pth-order 
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Volterra filter given by 




x{j) 

M 


where 


x{k)-'^h^^'>{k,y\e)y^^\y) 


(2.9) 


= C^j, k)-Y, O; {k, y) 


- ^ h^^\k, y\e)C^y(P) (j, y) 

+ X! (fc, J^|6»)C'y(p) (^i, v), 






(2.10) 

C.{fk) = 

{x{j)x{k )), 

(2.11) 

C^yip) (j, y) = 

{x{j)y^^Hy)) , 

(2.12) 

Cyi.p)iy,t^) = 

{y^P){y)yiP){i3)). 

(2.13) 


To optimize the Volterra filter, one can seek the pa¬ 
rameters 9 that minimize any desired component of 
in Eq. (2.10), which has the remarkable 
feature of depending only on finite-order correlations. 
Specifically, C^y{p) (j, y) depends on the correlation be¬ 
tween x{j) and products of y elements up to the Pth or¬ 
der, and CyiP) depends on the correlations among y up to 
the 2Pth order. Stationarity assumptions and frequency- 
domain techniques can further simplify the expressions. 

Quantum mechanics comes into the problem through 
the correlations. They must obey uncertainty relations 
with other incompatible observables [7, 48] . They can vi¬ 
olate Bell [49] and Leggett-Garg [50] inequalities, requir¬ 
ing different probability spaces for different experimen¬ 
tal settings. They may result from nontrivial internal 
quantum dynamics with no classical correspondence; the 
promise of quantum computation and simulation [51] is 
in fact based on the difficulty of reproducing quantum dy¬ 
namical statistics using any hidden-variable model. This 
difficulty also means that attempts to simplify quantum 
filters via classical models [26, 34, 36] are likely to be 
inaccurate for highly nonclassical systems. The Volterra 
filters sidestep the issue via a manifestly non-Markovian 
approach that does not require an online simulation of 
the internal quantum dynamics. The identihcation of the 
correlations and the filter synthesis, though nontrivial, 
can be done offline for control applications. 

A challenge for classical applications of Volterra filters 
is that the correlations are often difficult to model or mea¬ 
sure in practice, but it is less problematic for quantum 
systems: computing and measuring correlation functions 
is already a major endeavor in condensed-matter physics 
[52] and early quantum optics [53] with an extensive lit¬ 
erature. The Volterra-series approach to input-output 


analysis [45] should also help their simulation. Com¬ 
pared with the stochastic-master-equation approach [1- 
4], the use of correlation functions has the advantage of 
not requiring a Markovian model or stochastic calculus, 
although the Volterra filters may require a longer memory 
depending on the time scales of the correlation functions 
and the signal-to-noise properties. An empirical alter¬ 
native to prior system identihcation is to train the hlter 
directly using experimental or simulated data to mini¬ 
mize the sample errors. 

I now consider the ideal case where arbitrary Volterra 
hlters can be implemented, such that the tunable param¬ 
eters 9 are all elements of Since is quadratic 
with respect to the minimization can be performed 
analytically. Dehne the risk function [54] to be minimized 
as 


R{9) = (j, k\9)u{k), (2.14) 

3,k 


where u is an arbitrary real vector. The optimal Volterra 
hlter 


= argmini?(h*-^h (2-15) 

for arbitary u satishes the equation 

C:,y(p)ij,i^) = '^h^^\j,y)Cy(P){y,iy), (2.16) 

which is a system of linear equations with respect to 
and can be solved by conventional methods, and the re¬ 
sulting error covariance matrix is 

E(^)(j,fc) = s(^)(j,fc|h(^)) (2.17) 

= C^ij,k) -'^U^\j,y)C^yiP){k,y). 

(2.18) 

This error can be computed offline to evaluate the op¬ 
timal performance of a Volterra filter and the trade-off 
between the error and the filter complexity P. Going to 
a higher order is guaranteed not to increase the error, 
since E(^) < EW) if P > Q (a higher-order hlter can 
always achieve the performance of a lower-order hlter by 
ignoring the higher-order terms in As the inhnite- 

order Volterra hlter can be regarded as the Taylor series 
for an arbitrary function, will be the optimal among 
arbitrary estimators and E(°°) will coincide with the ab¬ 
solutely optimal error. E^^) thus provides a hierarchy of 
increasingly tight upper error bounds for optimal quan¬ 
tum inference. Most importantly, a finite-order Volterra 
hlter can still enjoy a performance given by Eq. (2.18) 
for any statistics, even if it is not optimal in the abso¬ 
lute sense. On a fundamental level, it is interesting to 
note that, if x is classical, the upper error bounds also 
apply to the Bayesian quantum Gramer-Rao [6, 7, 42] 
and Ziv-Zakai [43] lower error bounds, forming a novel 
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set of operationally motivated uncertainty relations; an 
example is shown in Sec. IIC. 

The optimal P = 0 Volterra filter does not process the 
measurement and is simply given by the prior expecta¬ 
tion {x). The P = 1 Volterra filter is a linear filter with 
respect to y and deserves special attention, as it is the 
simplest Volterra filter beyond the trivial zeroth-order 
case and will likely become the most popular. If x and 
y are jointly Gaussian, the optimal linear filter is also 
the optimal among arbitrary estimators and equivalent 
to the Kalman filter when applied to the prediction of 
Markovian dynamical systems [55], but the linear filter 
can still be used for any non-Gaussian or non-Markovian 
statistics and depends only on the second-order correla¬ 
tions in terms of x and y. 


B. Continuous-time quantum filtering 


where 

a(t,<) = (x2(t)), (2.23) 

Cxy{t,T) = {x{t)y{T)) , (2.24) 

Cyit,T) = {y{t)y{T)) (2.25) 

are the only correlation functions needed to compute 

both the filter and the error. Although this form of the 
optimal linear estimator is known in the classical con¬ 
text [55], its applicability to quantum systems with any 
nonlinear dynamics and non-Gaussian statistics is hith¬ 
erto unappreciated. Gompared with the stochastic mas¬ 
ter equation, the linear filter can be more easily imple¬ 
mented using fast digital electronics or even analog elec¬ 
tronics in practice [23, 56] for measurement-based feed¬ 
back control, while the implementation of higher-order 
filters is more involved but can leverage existing digital- 
signal-processing techniques [41]. 


For example, consider the continuous-time quantum 
filtering and prediction problem, which is to estimate a 
Heisenberg-picture observable x(t) given the past mea¬ 
surement record {2/(r);<o < t < T < t} [A], It can be 
shown that all the Heisenberg-picture operators under 
consideration commute with one another under rather 
general conditions for filtering and prediction [4, 46]. If 
t < T is desired for smoothing [17], care should be taken 
in the modeling to ensure that x{t) still commutes with 
y and an operational meaning of the estimation error ex¬ 
ists. For example, a c-number signal, such as a classical 
force, commutes with all operators by definition. 

To transition from the discrete formalism to continous 
time, define a discrete time given by 

tj=to+j6t, (2.19) 

with initial time tg, integer j, and time interval St. 
For infinitesimal St, the linear P = 1 estimator in the 
continuous-time limit becomes 

x{t\y) = ho{t) + f dThi{t,T)y{T), (2.20) 

Jto 


C. Heisenberg-picture uncertainty relation 

To demonstrate a side consequence of the Volterra- 
filter formalism, here 1 use the analytic error expression 
for the first-order Volterra filter to derive a quantum un¬ 
certainty relation for Heisenberg-picture operators. Gon- 
sider the Hamiltonian ij(t) = joo(0 ~ Qx{t), where g is a 
canonical position operator, x{t) is a classical force, and 
Ijo is the rest of the Hamiltonian. Suppose that j^o is 
at most quadratic with respect to canonical position and 
momentum operators, such that the equations of motion 
for those operators in the Heisenberg picture are linear. 
The initial density operator p, on the other hand, can 
have any non-Gaussian statistics. 

Gonsider an output field quadrature operator y(t) that 
commutes with itself at different times in the Heisenberg 
picture [4]. For example, it can model the homodyne 
measurement of an output optical field in optomechanics. 
It can be shown that 

y{t) = yo{t)[ dtg{t,T)x{T), (2.26) 

where 


where x{t\y), ho(t), hi{t,T), and ?/(t) are continuous¬ 
time versions of x{j\y), hQ{j), hi{j,k)/St, and y{k), re¬ 
spectively. Eq. (2.20) is a continuous-time limit of the 
Volterra series in Eq. (2.3) for P = 1. Assuming zero- 
mean X and y for simplicity and using Eqs. (2.16) and 
(2.18), the optimal linear filter hi{t,T) and the corre¬ 
sponding mean-square error E*^^^(t,t) can be expressed 
as 


Cxy{t,T) = / dshi{t,s)Cy{s,T), 

Jto 

f]^^'>(t,t) = Cxit,t) - f dThi{t,T)Cxy{t,T), 
Jto 


( 2 . 21 ) 

( 2 . 22 ) 


/ U2/o(i):9o(T)], t>T, 

1 0, t<T, 


(2.27) 


is the causal c-number commutator and the subscript 0 
denotes the interaction picture with respect to the Hamil¬ 
tonian Sjq. 

Without loss of generality, assume that x{t), yo{t), and 
qo{t) are zero-mean processes. Gonsider the estimation 
of x{t) using the record {j/(t);0 < t < T}. If yo{t) has 
non-Gaussian statistics, the optimal nonlinear estimator 
is difficult to derive, but the first-order Volterra filter 
given by 

HAv) = [ dThi{t,r)y{r) (2.28) 

^0 
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can be analyzed more easily. To proceed, it is more con¬ 
venient to consider discrete time as defined in Eq. (2.19). 
Regarding x, y^, y, and x as column vectors and g and 
hi as matrices, Eqs. (2.26) and (2.28) can be rewritten 
in matrix form as 


y = yo + stgx, (2.29) 

X = 6thiy. (2.30) 

With covariance matrices defined as 

a = {xx'^) , (2.31) 

Cyo = {yoy^), (2.32) 

C,y = {xy^) =dtC,g^, (2.33) 

Cy = (yy^) = St^gC^g^ + Cyo, (2.34) 


where T denotes the matrix transpose, the optimal linear 
filter becomes 

dthi = Ca;yCy^ = StC^g^ {St'^gC^g^ + Cyo) ^, (2.35) 
and the error covariance matrix becomes 

= (^(^x — Sthiy'j (^x — Sthiy'^ ^ (2.36) 

= a - SfhiCjy (2.37) 

= Cj:- St^C^g"' {St^gC^g^ + Cyo) ^ gC^ (2.38) 
= {C-^+St^g^C;Jgy\ (2.39) 

where the last line uses the matrix inversion lemma [57]. 

The error covariance can be compared with the 
Bayesian quantum Cramer-Rao bound derived in 
Ref. [42]. The quantum bound for Gaussian x results 
in a matrix inequality given by 

S(') > (cy + , (2.40) 


where 


Cqo{tj,tk) = ^ {qo{tj)qo{tk) + qo{tk)qoitj)) ■ (2.41) 

Unlike x{t) and y{t), q{t) may not self-commute at differ¬ 
ent times, and the symmetric ordering in the covariance 
function [58] arises naturally from the derivation of the 
quantum bound in Ref. [42]. Comparing Eq. (2.39) and 
Eq. (2.40), it can be seen that the inequality holds only 
if 

9 CyQ g < j^Cqo, (2.42) 

which is a matrix uncertainty relation between two quan¬ 
tum processes in the Heisenberg picture involving their 
causal commutator g. Note that j/o and q are canonical 
phase-space coordinate operators with linear dynamics 
but need not have Gaussian statistics. The end result 


does not involve x and can be applied to any quantum 
system that satishes the stated assumptions beyond the 
estimation scenario. The estimation procedure nonethe¬ 
less gives the relation a clear operational meaning. 

Eq. (2.42) can be further simplified by assuming linear¬ 
time-invariant dynamics and stationary statistics. The 
result in the continuous long-time limit is a spectral un¬ 
certainty relation given by 


SyoiySqoy) > ^|G(a;)|2, (2.43) 

with the frequency-domain quantities defined by 

/ OO 

■^Syo{uj)exp[ioj{t-T)], (2.44) 

/ OO 7 

-^Sqo{oj)exp[iuj{t-T)], (2.45) 

/ OO 7 

■^G{uj)exp[iu}{t-T)]. (2.46) 

The spectral relation imposes a lower bound on the noise 
floor of an output operator yoit) in terms of the spectrum 
of a noncommuting operator qo{t). For example, the re¬ 
lation can be used to determine the fundamental limit to 
the noise floor of optical homodyne detection as a func¬ 
tion of the mechanical-position power spectral density for 
a gravitational-wave detector [8, 59]. The inequality can 
be saturated if the quantum statistics are Gaussian [42] . 


D. Quantum state tomography 

For an application in quantum information processing, 
consider the estimation of parameters in a density matrix, 
also known as quantum state tomography [9-15]. Assume 
a d X d density matrix of the form 

^ d^-l 

Pz = ^ + ZaEa, (2.47) 

a — 1 

where / is the identity matrix, Ea is a set of Hermitian, 
traceless, and orthonormal matrices that satisfy 

Ea = El, tTEa=0, tTEaE/B = Sap, (2.48) 

and z is a column vector of real unknown parameters. 
Pz is Hermitian and tr = 1 by construction, and the 
density matrix describes a physical quantum state only if 
Pz is positive-semidefinite [11]. Measurements can often 
be modeled as [11] 


y = Az + yo, (2.49) 

where y is a column vector, A is a known measurement 
matrix, and yo is a zero-mean noise vector. The main dif¬ 
ficulty with the Bayesian estimation protocol [10, 13, 15] 
is that, owing to the physical-state requirement, the prior 
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for z is highly non-Gaussian, while the statistics of yo may 
also be non-Gaussian. With the non-Gaussian statistics 
and d scaling exponentially with the degrees of freedom, 
exact Bayesian estimation of z would suffer from the 
curse of dimensionality. Existing approximation tech¬ 
niques include Gaussian approximations [13] and particle 
filters [15], but their actual estimation errors remain un¬ 
clear. 

The Volterra hlters can be used despite the non- 
Gaussianity of z or yo- Let 

x = Bz (2.50) 

be a column vector of parameters to be estimated for a 
given sampling matrix B. Note that B can be a non¬ 
square matrix and the number of elements in x can be 
much smaller than that in z if the dimensionality of the 
latter is a concern. For example, the hdelity between 
the density matrix and a target pure state [60] can be 
expressed in this way, in which case i? is a row vector 
and a; is a scalar. The optimal first-order filter can be 
expressed as 


B (z) + hi{y- A (z)), 

(2.51) 

BCzA^ {ACzA^ + Cyoy\ 

(2.52) 

(zz^)-(z) (z)^. 

(2.53) 


The filter is guaranteed to offer an error covariance ma¬ 
trix given by 

S(i) = B [C-^ + A^Cyo^Ay^ B^. (2.54) 

The linear complexity and the error guarantee are the 
main advantages of the Volterra filter. A shortcoming 
is that, due to noise and the lack of a constraint in the 
algorithm, the estimate x may not lead to a positive- 
semidefinite density matrix. If this is a problem, an ob¬ 
vious remedy is to Hnd the physical x closest to x with 
respect to a distance measure. A more sophisticated way 
is to compute the posterior distribution over a region near 
X with a volume suggested by If the noise is low 

enough or the number of trials is large enough such that 
is small, the region needs to cover a small parame¬ 
ter subspace only, and the curse of dimensionality can be 
avoided. 

The remaining issue is the choice of prior (z) and 
in an objective manner. One option is to take one of 
the commonly used objective priors for z [11, 15] and 
compute its moments. For d = 2 and z being the Bloch 
vector, the prior moments can be easily calculated by 
taking advantage of the Bloch spherical symmetry. The 
computation seems nontrivial for d > 3, but for each d it 
needs to be done just once and for all. 

The most conservative and arguably paranoid option 
is to choose a prior that is least favorable to the Volterra 
filter. Given a prior probability measure on z, one 
can define a risk function, such as the Hilbert-Schmidt 
distance given by 

i?(^^) =trS(L(^^). (2.55) 


Then the least favorable prior is one that maximizes the 
risk while still observing the physical constraint on 
that is, 

arg max R['Kz)- (2.56) 

'’^x',Pz>0 

Note that this prior depends in general on the measure¬ 
ment matrix A as well as the sampling matrix B. With¬ 
out the physical constraint, the least favorable would 
be infinite, giving 

< B {A^CyAy^ B^, (2.57) 

and the Volterra filter would become equivalent to the 
unconstrained maximum-likelihood estimator for Gaus¬ 
sian yg. The effect of a hnite Cz is to pull the estimate 
from the maximum-likelihood value towards the prior (x) 
via the weighted average given by Eq. (2.51). 

III. QUANTUM DETECTION 
A. Formalism 

Assume two hypotheses denoted by Hq and "Hi. These 
hypotheses can be about the initial density operator as 
well as the dynamics and measurements of the quantum 
system [18]. As before, let the measured Heisenberg- 
picture observables be y with commuting elements under 
both hypotheses. The goal of detection is equivalent to 
binary hypothesis testing, which is to make a decision on 
"Ho or Hi based on y. Applications include force detec¬ 
tion [8, 18, 44], fundamenal tests of quantum mechanics 
[18, 19, 38, 61], quantum error correction [1, 29], and 
qubit readout [20-22] . Prior work on the use of Volterra 
filters for classical detection focuses on the heuristic de¬ 
flection criterion [62, 63], but it does not seem to have 
any decision-theoretic meaning or relationship with the 
more rigorous criteria of error probabilities [63]. Here I 
propose a similar performance criterion that is able to 
provide an upper bound on the average error probability, 
while still offering a simple design rule for the Volterra 
filters. To my knowledge the proposed design rule is new 
also in the context of classical detection theory. 

Let A(y) be a test statistic as a polynomial function of 
y similar to Eq. (2.3). For later notational convenience, 
I will rewrite it as 

Xiy) = ho + H^Y, (3.1) 

where the zeroth-order term ho is written separately, Y 
is a column vector with the elements in {y, y®^,..., y®^} 
without the constant term 1, 77 is a column vector with 
the corresponding elements in and T denotes the 

transpose. Let (/(y))o be the expectation of a function 
of y given hypothesis Ho, and (/(y))i be the expectation 
given hypothesis Hi. Note that the hypotheses can be 
about the initial density operator, the dynamics, and the 
definition of y. 
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I demand the test statistic to have different expecta¬ 
tions for the two hypotheses, viz., 

(A)o (A)i . (3.2) 

This means that the order P cannot be arbitrary but 
must be high enough to result in different expectations. I 
further demand the expectations to be symmetric around 
0, viz.. 


are the conditional covariance matrices. 1/7?. can be re¬ 
garded an output signal-to-noise ratio and has a similar 
form to the deflection criterion [62, 63], although 7? has 
a clearer decision-theoretic meaning as an upper error 
bound. 

The purpose of using 7? rather than Ve or Q is to de¬ 
fine an easy-to-optimize criterion in terms of finite-order 
correlations. To find the 7?-optimal filter, consider the 
Cauchy-Schwarz inequality 


(A)o + (A)i=0. (3.3) 

This is accomplished by setting 

ho = -H^Y, (3.4) 

Y=^-{{¥), + {¥),), (3.5) 

resulting in 

(A)i = -(A)o = i7TA, (3.6) 

A^i((y),-(y)o). (3.7) 

Without loss of generality, I assume (A)i = A > 0. 
Consider a threshold test that decides on 77o if A < 0 and 
Til if A > 0. This is commonly expressed as [55] 


X{y) t 0. (3.8) 

'H.q 

The average error probability becomes 

Ve{H) = TTo (1 a>o(?/))o + (f A<o(?/))i, (3.9) 

where ttq and tti are the prior probabilities for the hy¬ 
potheses and 1a>o and 1a<o are indicator functions. 
Since in general depends on infinite orders of A 

moments, I appeal to the Cantelli inequality [64] to ob¬ 
tain 


(A")n - (A)o 

(1a>o(2/))o < - A2\ ^ (3-10) 

/o 

and similarly for (1a<o(?/))i- This leads to upper bounds 
on 'Pe given by 


V,iH) < Q(77) < 7?(77), 

(3.11) 

~ 1 + im A)y(H^CoH) 


+ 

l + (77TA)2/(i7TCii7)’ 

(3.12) 

,r,tTT\ {t^oCo ¥ 

nH) - 

(3.13) 

where 



(3.14) 

Ci^{YY^)^-{Y),{Y)J 

(3.15) 


{H^Af < (A^M-^A) (3.16) 

for any positive-definite matrix M. The inequality is 
saturated if and only if 77 = aM~^A for any constant a. 
Setting M = ttoCq -I- ttiCi, I obtain 

H = argmin7?(77) = a{'KoCo + niCi)~^A, (3.18) 

H 

and the 7?-optimal test statistic A(y) = ho+H^Y^ taking 
a = 1 without loss of generality, becomes 

\{y) = A^i^oCo + TTiCi)-! {¥-¥), (3.19) 

which can then be used in a threshold test. The mer¬ 
its of this approach are similar to those in the estima¬ 
tion scenario: dependence of \{y) on finite-order cor¬ 
relations A, Co, and Ci without relying on a Marko¬ 
vian model, a performance guaranteed by upper bounds 
'Pe{H) < Q{H) < a (the actual 'Pe may be much lower), 
and a hierarchy of decreasing R versus increasing com¬ 
plexity. For the study of fundamental quantum metrol¬ 
ogy, P’e{H), Q{H), and TZ also provide a set of upper 
bounds on the Helstrom bound [6, 7, 44]. 

It is not difficult to show that, if the hypotheses are 
about the mean of a Gaussian y and Co = Ci, A(y) for 
P = 1 coincides with the well known matched filter, and 
the threshold test of X{y) against 0 leads to the optimal 
Pe among all decision rules if ttq = tti [55]. The deriva¬ 
tion of the 7?-optimal Volterra filter here in fact resem¬ 
bles the historic derivation of the linear matched filter 
via maximizing an output signal-to-noise ratio [55]. The 
crucial differences are that here X{y) can include higher- 
order products of y elements and the upper error bounds 
provide performance guarantees even for non-Gaussian 
statistics. 


B. Qubit readout 

For an application of the detection theory, consider the 
qubit readout problem described in Refs. [20-22]. The 
goal is to infer the initial state of the qubit in one of 
the two possibilities from noisy measurements. The two 
hypotheses can be modeled as 

Ho ■ y{tk) = yoitk), 

Hi : y{tk) = Sx{tk) + yoitk), 


(3.20) 








where a; is a hidden qubit observable that can undergo 
spontaneous decay or excitation in time, S' is a positive 
signal amplitude, and yo is a zero-mean noise process. To 
perform hypothesis testing given a record of y, consider 
the first-order 7?,-optimal decision rule given by 


H = {ttqCo + TTiCi)-^ A, (3.21) 

A(y) =H^{y- y) 0, (3.22) 

I-Lq 

where 

^=\{{v)i-{y)o), (3-23) 

y = \{{y)i + {y)o), (3-24) 

Co = (yy^)o - (y)o (y)l > (3-25) 

Cl = - (y)i {y)l , (3.26) 


and the upper error bounds are given by Eqs. (3.11)- 
(3.13). 

\{y) for P = 1 is a linear filter with respect to y and 
similar to the linear filters proposed in Ref. [20]. An 
advantage of the P-optimal rule here is that the filter 
H depends only on the first-order moments A{k) and 
y{k) and second-order correlations Cq and Ci. All these 
moments can be simulated or measured directly in an 
experiment without the assumptions of continuous time, 
white Gaussian noise, and uncorrelated signal and noise 
made in prior work. The calculation of H is relatively 
straightforward compared with the numerical optimiza¬ 
tion procedure in Ref. [20] , while Q and TZ provide theo¬ 
retical performance guarantees. The upper bounds may 
be conservative, and a more precise comparison of Ve{H) 
with other linear or nonlinear filters [20-22] will require 
further numerical simulations and experimental tests. 

To proceed further, consider the continuous-time limit. 
For the two-level x G {0,1} process with initial value 
x(0) = 1 and spontaneous decay time Ti studied in 
Refs. [20, 22], it is not difficult [65] to show that the 
mean is 

(a;(t)) =exp , (3.27) 

and the covariance function is 
Cxit,T) = {x{t)x{T)) - {x{t)) {xir)) (3.28) 

-exp(-^^). (3.29) 

For a zero-mean white Gaussian noise with noise power 

n, 


= exp 


max(<, r) 


{yo{t)yo{T)) = uS{t-T). 
The test statistic becomes 

y{t) - I , 



(3.30) 


(3.31) 


and a continuous-time limit of Fq. (3.21) leads to a Fred¬ 
holm integral equation of the second kind [55] given by 


^ (a;(t)) = nh(t)-I-TTiS”^ f dTCx{t,T)h{T). (3.32) 
2 Jo 


Further analytic simplifications may be possible for T —>■ 
oo using Laplace transform, but a numerical solution of 
the Fredholm equation can easily be sought, as it is linear 
with respect to hi and can be inverted in discrete time 
using, for example, the mldivide function in Matlab. 

Define the input signal-to-noise ratio (SNR) as 
S'^Ti/n. Fig. 1 plots some numerical examples of the 
filter for ttq = tti = 1/2 and T = 5Ti. The Matlab 
computation of all the filters shown with 5t = O.OOlTi 
takes seconds to complete on a desktop PG. Fig. 2 plots 
the upper error bounds versus the input SNR. The upper 
bounds turn out to be conservative here, as a numerical 
investigation of Ve later will demonstrate. 


Filter shapes for SNR = 1,10, 20,..., 200 



t/Ti 


FIG. 1. (Color online). The normalized 77-optimal filters 
2Tlhi{t) / S in log scale versus normalized time t/Ti for differ¬ 
ent input SNR = S^Ti/n = 1,10, 20,..., 200. ttq = rri = 1/2 
and T = 5Ti are assumed. The different plots can be distin¬ 
guished by the reducing correlation times for increasing SNR. 


The proposed decision rule can be compared with the 
optimal likelihood-ratio test (LRT) [55]. For the given 
problem, there exists an analytic expression for the log- 
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FIG. 2. (Color online). Upper bounds Q{H) and "R on the 
average error probability Ve for the first-order Volterra filter 
versus input SNR from 10 dB to 30 dB in log-log scale, ttq = 
TTi = 1/2 and T = 5Ti are assumed. Vs is guaranteed to be 
in the shaded region below the curves. 


optimal threshold brings its error probability at input 
SNR = 10^ down to 8.3 x 10“^. A higher-order filter is 
hardly necessary for the SNRs considered here. 


Numerical error probabilities 



likelihood ratio given by [22] 


Ao(y) 

dr]{t) 

x{t) 

Pi{t) 


y{t)dt, 

Pi(t) 

Po(t) +Piit)’ 



exp 


S 

n 


f dr]{T) - t 

Jo 




Po{t) 


1 /■* 

— / dTpi(r), 
Ji Jo 


(3.33) 

(3.34) 

(3.35) 

(3.36) 

(3.37) 


where the drj integrals are in the Ito sense. The optimal 
decision rule is thus 


Ao(?/) ^ In—. (3.38) 

-Ho '^1 

Although the LRT will achieve the lowest Ve, the highly 
nonlinear dependence of Ao on y makes its exact imple¬ 
mentation difficult in real-time applications or for a large 
number of qubits. 

The average error probabilities for both the 77.-optimal 
rule and the LRT are estimated numerically using Monte 
Carlo simulations and plotted in Fig. 3. The errors are 
close at lower input SNR values. Considering the simplic¬ 
ity of the 7?.-optimal rule, the divergence at higher SNR 
is expected and indeed slight. At the input SNR of 10^, 
Ve for LRT is 6.2 x 10“^, while that for the 77.-optimal 
rule is only around a factor of 2 higher at 1.48 x 10“^. 
A further optimization of Ve beyond the results shown 
in Fig. 3 can be done by fine-tuning the threshold of the 
7^-optimal rule. For example, a numerical search for the 


FIG. 3. (Color online). Numerically computed average error 
probabilities Ve for the 7?.-optimal rule and the likelihood- 
ratio test (LRT) versus the input SNR from 10 dB to 30 dB 
in log-log scale, ttq = tti = 1/2 and T = 5Ti are assumed. 
Also shown are parts of the upper bounds Q{H) and TZ for 
comparison. 


The upper bounds depend only on low-order moments 
and apply equally to all problems with the same low- 
order moments, regardless of their higher-order statis¬ 
tics. It is not surprising that such indiscriminate bounds 
are loose for this particular example, as shown in Fig. 3. 
What is surprising is the near-optimal performance of a 
decision rule based on a loose upper bound. The log- 
likelihood ratio is given analytically for the problem con¬ 
sidered here, so one may compare it with the 77.-optimal 
test statistic to see how the two resemble each other. In 
general, however, the log-likelihood ratio is difficult or 
even impossible to compute if the full probability mod¬ 
els are more complicated or simply unidentified. The 
7?.-optimal rule requires only low-order moments to be 
known, and is hence more convenient to implement in 
practice. 


IV. CONCLUSION 

I have proposed the use of Volterra filters for quan¬ 
tum estimation and detection. The importance of the 
proposal lies in its promise to solve many of the practi¬ 
cal problems associated with existing optimal quantum 
inference techniques, including the curse of dimensional¬ 
ity, the lack of performance assurances upon approxima¬ 
tions, and the need for a Markovian model. Beyond the 
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examples of quantum state tomography and qubit read¬ 
out discussed in this paper, diverse applications in quan¬ 
tum information processing [1, 38, 51], including cool¬ 
ing [26], squeezing [27], state preparation [28], metrology 
[6-8, 16-18, 42-44], fundamental tests of quantum me¬ 
chanics [18, 19, 38, 61], and error correction [29, 30], are 
expected to benefit. Potential extensions of the theory 
include adaptive, recursive, and coherent generalizations 
for feedback control [1] and noise cancellation [66], filter 
training via machine learning [67], robustness analysis, 
the use of other performance criteria for improved ro¬ 
bustness [68] or multi-hypothesis testing [18, 19], a con¬ 
nection with Shannon information theory through the 


relations between hltering errors and entropic informa¬ 
tion [69], and a study of fundamental uncertainty rela¬ 
tions in conjunction with quantum lower error bounds 
[6, 7, 16, 42-44], 
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